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Abstract: Wc consider the nonparametric estimation of the density func- 
tion of weakly and strongly dependent processes with noisy observations. 
We show that in the ordinary smooth case the optimal bandwidth choice 
can be influenced by long range dependence, as opposite to the standard 
case, when no noise is present. In particular, if the dependence is moder- 
ate the bandwidth, the rates of mean-square convergence and, additionally, 
central limit theorem are the same as in the i.i.d. case. If the dependence is 
strong enough, then the bandwidth choice is influenced by the strength of 
dependence, which is different when compared to the non-noisy case. Also, 
central limit theorem are influenced by the strength of dependence. On the 
other hand, if the density is supersmooth, then long range dependence has 
no effect at all on the optimal bandwidth choice. 
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1. Introduction 

The nonparametric estimation of the density function for dependent sequences 
has attracted many researchers in the past. We are not claiming to provide 
the full overview of this topic, however results can be summarized as follows. 
In case of weak dependence the results on the (mean square error) optimal 
bandwidth choice, optimal rates of convergence for the mean square error and 
central limit theorems for the Parzen-Rosenblatt kernel estimator are exactly 
the same as in i.i.d. case (see e.g. [3] or [28, Theorem 1]). The situation is a bit 
more complicated for long-range dependent sequences. Although dependence 
has no influence on the optimal bandwidth choice, the rates of mean-square 
convergence may differ according to very strong and moderate dependence. In 
the latter case they are the same as in the i.i.d. situation. We refer to [5, 8, 13, 14] 
and [ ] . Similarly, if the bandwidth is "small" , then central limit theorem for 
the kernel density estimates is the same as in the i.i.d. case. On the other 
hand, if the bandwidth is "big" enough, then the long range dependence effect 
dominates (see [6], [28, Theorem 2]). The similar phenomena occur in random- 
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design regression problems. The reader is referred to [22] and [24] for the up-to- 
date results and references for a kernel and a local linear estimation, respectively. 

As for smooth estimators of the distribution function, either short or long 
range dependence have no influence on the optimal bandwidth choice. The op- 
timal bandwidth is the same as in the i.i.d. case. However, the optimal rates 
of convergence for the mean square error are always affected by long range 
dependence (see [8] for more details). 

In the present paper we will consider deconvolution problem for dependent 
sequences. Suppose that we have n observations Y%, . . . , Y n available. We want to 
estimate the unknown density / = fx of a random variable X, where Y = X + e, 
with a measurement error e of a known distribution F e and the density f e . It 
is assumed that X and e are independent and that {e, ej,j > 1} is the i.i.d. 
sequence. 

We will estimate f(xo) using the classical estimator (cf. [4, 9]) 



Above, (j> t is a characteristic function which corresponds to the density f e and 
4>K(t) = J R exp(itx)if (x)dx. The mean square error is defined as 



of the distribution function F. 

In the i.i.d. case the deconvolution problems were studied in [4, 9, 10] and 
[25] among others. In the latter paper, Fan provided the optimal rates of con- 
vergence for MSE(/, h n ) in both ordinary smooth and supersmooth case. As for 
weakly dependent case, the previous results have been obtained under various 
mixing conditions (see [18, 19, 20]) and under association (see [21]). The prin- 
cipal message from the latter papers is that the results (optimal bandwidth, 
optimal rates, central limit theorem) for weakly dependent sequences are the 
same as in the i.i.d. case. As for the distribution function, the problem was 
studied in [9] in the i.i.d case and in [17] in the dependent case. 

However, mixing is rather hard to verify and requires additional assumptions. 
In particular, let {Z, Zi,i € %\ be a centered sequence of i.i.d. random variables. 
Consider the class of stationary linear processes 




where 




MSE(/, hn) :=-E(f n (x )-f(x )) 2 . 



We also study the behavior of the estimator 




oo 
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To obtain a strong mixing for linear processes both regularity of the density of 
Z\ and some constraints on c^'s are required (see e.g. [7]). On the other hand, 
association requires that all Ck are positive. To overcome such problems, the 
martingale based method has been proposed and it works surprisingly well in 
a variety of problems, not necessary connected with nonparametric estimation 
(see [15, 16, 26, 27, 28]). Thus, from technical point of view assuming that 
c = 1 and the sequence Ck, k > 0, is summable (referred later as short range 
dependence, SRD), we will extend Masry's results to moving averages, without 
referring to mixing or association at all. 

However, the more interesting problem is the influence of long range de- 
pendence on the deconvolution estimator. To deal with it, we assume that 
Co = 1 and Ck is regularly varying with index —7, 7 £ (1/2, 1). This means that 
Ck ~ fc~ 7 Lo(k) as k — > 00, where Lq is slowly varying at infinity. We shall refer to 
all such models as long range dependent (LRD) linear processes. In particular, if 
the variance exists, then the covariances pk '■= EXoXfc decay at the hyperbolic 
rate, p k = L(fc)fc _(27-1 ), where limfe^ L(k)/L%(k) = B(2 7 - 1, 1 - 7) and 
B{- 1 ■) is the bet a- function. Consequently, the covariances are not summable. 

We will show below that in the ordinary smooth case the optimal bandwidth 
choice for the density problem is influenced by the dependence parameter 7, as 
opposite to the optimal bandwidth in the standard (non-noisy) kernel density 
estimation. In particular, if the dependence is moderate, then the optimal band- 
width and the optimal rates for the density estimation are the same as in the 
i.i.d. case. If the dependence is very strong, the optimal bandwidth depends on 7 
itself. See Proposition 2.2 and Corollary 2.3. In case of the distribution function, 
the dependence parameter is always present in the optimal bandwidth and the 
optimal rates of convergence, as opposite to the non-noisy case (Proposition 2.5). 

As for central limit theorem for the density estimator, we have results mim- 
icking CLT for standard kernel density estimators (cf. [28, Theorem 2]): if h n 
is small, then CLT is the same as in the i.i.d. case; if h n is "big", LRD effect 
starts to dominate. Note that the change from "i.i.d." behavior to LRD behav- 
ior occurs in the same way as in the standard kernel estimation, according to 
h n = (0^1/n) or a^i/n = o(h n ), where 




In the distribution case, we do not have such dichotomous behavior and long- 
range dependence always influences the limiting behavior. 

We note in passing that "small" and "big" bandwidth may have different 
meanings for different estimation problems. For example, "small" band widths 
arc different when estimating a function and its derivative (see [22] for a com- 
plete analysis in the regression setting). In the present context and density 
estimation for error-in-variablcs models, "small" and "large" bandwidths are 
the same as for non-noisy case. OF course, this dichotomous behavior is well- 
known, however, the crucial difference between noisy and non-noisy problem 
is the optimal bandwidth choice. Note that in the non-noisy setting the opti- 
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mal bandwidth, under appropriate conditions, is not influenced by 7, regardless 
whether estimation of the function (as mentioned above) or its derivatives is 
considered. Thus, in errors-in-variables models we have different phenomena 
than those described in [22]. 

Another phenomena is that for supersmooth densities the optimal bandwidth 
choice and the rates for MSE(/, h n ) are always the same as in the i.i.d. case, 
irrespectively of the dependence being moderate or very strong. At the first 
sight this message seems to be optimistic, however, it means that the rate of 
convergence is so slow that even the very strong dependence cannot worsen it. 

2. Results 

Recall that by SRD assumption we mean that X^fco Cfe 00 ■ Additionally, we 
assume that X]fc°=o Cfe ^ 0. By LRD assumption we mean that Ck ~ fc~ 7 Lo(fc), 
7 6(0,1). 

We assume that f = fx is twice differentiable with continuous and bounded 
second order derivatives and K is of the second order, i.e. / uK{u)du = and 
7^ J u 2 K{u)du < 00. Furthermore, we assume that 

\<t>e(t)\>0 (2) 

and that £ , <pK are twice differentiable with continuous and bounded deriva- 
tives. These assumptions are standard in the i.i.d. situation for both ordinary 
smooth and supersmooth case. 

The proofs are based on the following decomposition: Let T% = cr(Xj, Zj,j < 
i). Write 

= V („. 

h 



- EW^)-e 



3=1 



3=1 



x - Yj 



V n, n 



( — - ) 



Note that {m n (xo), J'n , n > 1} is a martingale. We call l n (xo) the differentiable 
part. The similar decomposition is also valid in the distribution case. 

2.1. Ordinary smooth densities 

Throughout this section, we consider the ordinary smooth case, i.e. 

\t- p \\^(t)\ >0, 0>1. (4) 

Furthermore, assume that 

S p>1 [ \uf- 2 \ ( j )K (u)\du+ [ \uf- 1 W K {u)\du+ [ \uf\^\u)\du < 00, (5) 
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and 



1 



27r|Bi| 2 



t\ 2<3 <f K (t)dt < oo. 



(6) 



To deal with the LRD case we will impose a stronger condition than (5): with 
P> 1, 



E / M"-W(«)|<oo. 

„- n ^ 



(7) 



We will consider some technical assumptions on the densities: 

(A) fz, / e , the densities of Z and e, respectively, are uniformly bounded and 
Lipschitz continuous. 

Note that in this case, f e +z - the density e + Z, f = fx and fy - the density of 
Y are also uniformly bounded and Lipschitz continuous. These conditions are 
required to handle SRD case. 



(Bl) Er=l/ fSzW 



dv < oo, 



(B2) E"=o [I fViv) dv + J /£» 

differcntiable with continuous and bounded derivatives. 

These conditions are required for LRD case (sec Appendix for more discussion) 
First, we provide the asymptotic expansion of the mean square error. 

Proposition 2.1. Assume (4), (5), (6), (A) . Under the SRD assumption, 



(0 



dv I < oo and fz, fe+z are twice 



\f {2 \x ) J u 2 K{u)du^j h 4 n + D 1 f Y (x )n- 1 h-^ + ^ 



+ o(hi + n-'h-J 2 ^. 



MSE(/,fc n ) = 



Proposition 2.2. Assume (4), (6), (7), (Bl), (B2) and EZf < oo. Under the 
LRD assumption, 

MSE(/,/i n ) = Q/< a >(* ) y u 2 #(u)du) ^ + J D 1 / y (x )n- 1 / l -^ +1 )+ 

+ (/^o^n-V*,^ + o (ft* + n" + n" 2 ^^) • (8) 

If ft„ = o(?i/<7 2 j) (in particular, ft n = o(n _ ' 2 ^ 1-7 ^)), then the optimal choice 
is the same as in the i.i.d. case: h n = Cre _1 '( s+2 ^ (here and in the sequel, C is 
a generic constant, which does not depend on n). Consequently, MSE(/,/i„) ~ 
Q n -i/{h+2@) _ mus ^ check that such the choice is permitted, i.e. to check if 
„-i/(6+2/9) = ( n -(2(i-7))). This is equivalent to 1 > 7 > > i). 

If n/cr 2 ^ = o(/i„) (in particular, n -2 ' 1 r ) = o(h n )), then we find the op- 
timal bandwidth as h n ~ Cn- (27_1)/2(2+/3) . Then, to assure that n" 2 ^" 7 ) = 
o(n _ ( 27 - 1)/2(2+/3) ) we assume that 7 < From Proposition 2.2 we ob- 

tain the following corollary. 
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Corollary 2.3. Assume that ^ < 4(2+^)^2 < 7 < 1- Then, by choosing h n = 
Cn -i/(5+20) W£ oMain MSE(/,/i n ) ~ Cn- i ^ 5+2l3 \ 

Assume that \ < 7 < 4(2+^+2 < L With h ™ ~ C?i-( 2 ''- 1 )/ 2 ( 2 +' 3 ) we obtom 
MSE(/, /i n )~Cn- 2 ( 2 T- 1 )/( 2 +«. 

Remark 2.4. The result of Proposition 2.1 extends the previous ones for p— 
and a— mixing (see [19, Lemma 2.1b]) or associated sequences ([21]). In princi- 
ple, it says that the optimal bandwidth, the rate of convergence of Var/„(xo) 
(and, consequently, of MSE(/, h n )) for weakly dependent sequences are the same 
as in the i.i.d. case. Thus is also true for LRD sequences with moderate depen- 
dence (7 close to 1). On the other hand, if the dependence is very strong, then 
the bandwidth and the rate of convergence may depend on 7. 

As for the distribution estimator we have the following result. 

Proposition 2.5. Assume (4), (6), (7), (Bl), (B2) and EZf < 00. Under the 
LRD assumption, 

MSE(F,h n )=(lf(x ) f u 2 K(u)du) htMfY(xo)) 2 ^^+of^^ + h 4 \ 

Wc can sec that the optimal bandwidth is h n ~ C (er 2 1 /n 2 ) 1 ^ 2 ' /3+2 ' 1 and opti- 
mal mean square error is of the order (c 2 1 /rt 2 ) 2 ^' 3+2 \ Under weak dependence 
the optimal bandwidth and the optimal mean square error are proportional to 
n -!/ 2 (/3+2) anc j n -2/(/3+2)^ reS p e ctively. Consequently, in case of the distribution 
function the optimal bandwidth and the rates change as soon as we cross the 
boundary between short- and long range dependence. 

As for CLT we have the following results. 

Theorem 2.6. Suppose that nh n —* 00 and let <r 2 (xo) = Dify(xo). Under 
conditions of Proposition 2. 1 we have 

n i/2^+i/2 _ E / n(a . o) ) i> 7V(0,a 2 (x o )). 

Theorem 2.7. Suppose that nh n — > 00 and let <j 2 (xq) = Dify(xo)- Under 
conditions of Proposition 2.2 we have 

^1/2^+1/2 (j^ Xo) _ E / n(a . )) A iv(o, a 2 (x )) 
ifh n = o(n/fr 2 1 ), and 

^ (f n (x ) - Ef n (x )) A N(0, (f Y (x a )) 2 ) 

ifn/o 2 ll = o(h n ). Under the conditions of Proposition 2.2 we have in either 
case 

^ (P n (x ) - EF n (x )) A JV(0, (f Y (x )) 2 ). 

O n A V ' 
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Remark 2.8. Theorem 2.6 extends results of [10] and [20]. The result of The- 
orem 2.7 should be compared with Theorem 2 in [28]. Note that the change 
from SRD to LRD behavior occurs in the same way as in the standard kernel 
density case, i.e. by crossing the boundary h n ~ n/a^ x . Theorem 2.7 describes 

the dichotomous behavior of f n (xo). If fy(xo) = 0, then we may establish tri- 
chotomous behavior along the lines of Theorem 3 in [28] . 

Remark 2.9. We shall comment on the assumption EZf < oo. This is neces- 
sary for us to use Wu [2G] result for empirical processes (see Lemma C below). 
Instead, we can use Giraitis and Surgailis [12] assumption E|Zi| 2+<5 < oo to- 
gether with additional condition on fz (See also Section 2.2). However, it does 
not solve completely the problem in case EZ 2 < oo. 

Remark 2.10. It would be desirable to extend the results of, especially, Propo- 
sition 2.2 and Theorem 2.7 to the multivariate setting. However, it does not 
seem to be feasible when using the martingale approximation approach as in 
the current paper. 

Remark 2.11. We do not provide CLT for F n (xo) in the weakly dependent 
case. The martingale method we use here is based on fact that in the den- 
sity case the differentiable part is negligible compared to the martingale part, 
provided that SRD conditions hold (compare (18) with (20)). However, in the 
distribution case if SRD assumptions are fulfilled, then the martingale part and 
the differentiable part are of the same order and the method does not apply. 
We also note that the problem is symmetric in X and e, i.e. instead of assuming 
that Xj are dependent and Cj are i.i.d., we may assume that Xj are i.i.d. and 
ej arc dependent. What is important in our results is the dependence structure 
of Yj-'s. In [17] it is assumed that Xj is mixing and it is claimed that VarF„(xo) 
has different behavior according to tj being dependent or i.i.d. Note, however, 
that their proof of Lemma 3.2(i) is invalid. 

To obtain confidence interval for f n (xo) we choose appropriate bandwidth 
to make sure that the variance of the estimator dominates the bias term. In 
particular, in the LRD case it reads as follows. 



Corollary 2.12. Assume that h n = o(n/a^ x ) and h n = o(n 1 /< 5 + 2 ' 3 )) (which 
ensures that \ < ^2+a)+2 < 7 < -U- Then 



Corollary 2.13. Assume that n/a% A = o(h n ) and h n = (n- (27_1 )/ 2(2+/3 )) 
(which ensures \ < 7 < 4(2+g)+2 < -^)- Then 
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To apply Corollary 2.12 in a practical situation one has to estimate fy(xo)- If 
h n = o(n/(Tn and h n = o(re -1 /( s+2 ^), then we can estimate fy(xo) by using 

3=1 

Note that the kernel density bandwidth b n need not to be the same as h n , as 
assumed in [20]. In fact, the optimal mean-square error choice is b n ~ Cn -1 ' 5 . 
Further, if 7 > 9/10 (which is ensured by taking 5 < |p^|vj^ < 7 < 1), then 
both the mean square error and the variance of the kernel density estimator 
behave like Cn~ 4 / 5 . On the other hand, from Corollary 2.3, the variance of the 
deconvolution estimator behaves like n _1 /i.n^ . Thus, under the constrain 
h n = o(?i~ 1 /( 5+2 ' 9 )) the variance of the deconvolution estimator of fx(%o) dom- 
inates the variance of the kernel density estimator of fy{xo). Consequently, we 
may build confidence interval for f n (xo) by replacing fy(xo) with its kernel 
estimator in (9). 

Remark 2.14. As suggested by the Referee, assume that 

Xi = {l-B)- 5 °ct>- X {B)i>{B)Z j , 

where as before Zj is i.i.d., B is the backshift operator and </>, tp are polynomials. 
If Sq £ (0, 1/2) it is the particular case of LRD model (1) with the specification 
1 - 7 = So. If So £ (-1/2,0) (so that 7 e (1,3/2)), then this is the case of 
antipersistent sequences. Under appropriate regularity conditions, it was proven 
in [1, Theorem 3] that for all 7 e (1/2, 3/2) 

1 ™ 

i—l 

where v is a finite and positive constant. In view of the above result, it is 
intuitively clear that the expansion (8) is valid for 7 £ (1, 3/2) as well. However, 
following the comment below Proposition 2.2, in such the case we always have 
h n = o(n/a^ 1 ). Consequently, the rates of convergence are the same as in i.i.d. 
case. This is in contrast to fixed-design regression, where antipersistency may 
improve the rates of convergence beyond those for i.i.d. case. See [1] and [2] for 
more details. 

2.2. Supersmooth densities 

In a supersmooth case we consider the usual assumptions (cf. [20]): 

(i) Bi\tf° exp(-a\t\ ) < \<j) e (t)\ < B 2 \t\ 00 exp(-a\t\ ) as t -> 00 for some 
a > 0, B 1 ,B 2 ,/3> 0, fa £ R. 

(ii) 4>k has a finite support (—d,d). 

(iii) \4> K (t)\ < B 3 (d-t) m and <j> K (t) > B 4 (d-t) m for t £ (d-5,d) and positive 
constants 5,m, B3, B4. 
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(iv) The real (imaginary) part of </> e is negligible as t — > oo with respect to the 
imaginary (real) part. 

To deal with LRD linear processes (1), recall that p k = L(fc)fc~( 27_1 ) as k — > oo. 
We assume additionally that for all x, y, 

fk(x,y) = Mx)Mv)+PkfMfr{v)+h k (x,y), (10) 

where f k is the joint density of (Y ,Y k ) and 

\h k (x,y)\ < \ Pk \ 1+s h(x)h(y), (11) 

with 6 > and h being an integrable and continuous function. 

Proposition 2.15. Assume (i)-(iv) and that f Y is continuous and integrable. 
Under the LRD assumption and (10) we have 

MSE(/, h n ) = 0{{\nn)- 2 'i 3 ) 

by choosing h n = d (^ (1 _g° ln „ j , G (2 - 2~f, 1). 

The result of Proposition 2.15 means that in the supersmooth case long range 
dependence has no influence on the optimal bandwidth choice and the optimal 
rates for MSE(/, h n ). They are the same as in the i.i.d. and weakly dependent 
situation situation (cf. [9, 19]). 

Remark 2.16. Let f k .x ne the joint density of (Xo,Xk), where {Xj,i > 1} is 
the LRD linear process Xj = c kZj-k- Then 

fk,x(x,y) = fx(x)f x (y) + Pk.f'x(x)f' x {v) + h(x,y) (12) 

with h satisfying (11), provided appropriate smoothness condition on <pz- We 
refer to [8, 12] or [23] for more details. Consequently, having established (12), 
it is easy to verify (10). 

Remark 2.17. Note that the martingale approximation method used in the or- 
dinary smooth case requires the precise information about ||g n ||i, in particular, 
its finitcncss. It is not feasible in the supersmooth case. Instead, we addition- 
ally assume (10). We could have worked with this assumption in the ordinary 
smooth case and obtain the results for MSE(/, h n ). However, using linear struc- 
ture and the martingale approximation method we can obtain at the same time 
MSE(/, hp) and the central limit theorem. 

3. Proofs 

Since fx is twice diffcrcntiablc with continuous and bounded second order 
derivatives and K is of the second order, we obtain (see [18]) 

bias(/„(x )) ~ h 2 n ^fP(x ) J u 2 K{u)du. (13) 



R. Kulik/Deconvolution and dependence 



731 



3.1. Ordinary smooth case 

We have 

MSE(h n ) = Var/„(x ) + (hms(f n (x ))) 2 
1 



(nh n ) 2 
Since m n is a martingale 

Eml(x ) = riE ( g n 



Em 2 n (x )+El 2 n {x ) + 2Em n (x )ln(x )) + (bias(/ n (a; ))) 2 



xq - Yi 



E 



fjn 



X - Yi 



\F 



Let Ci = 9n 
[18, Lemma 3]) 



xo-Yj 
h„ 



ii = Ci - E(Cil-Fi-i). Under (2), (4) and (5) we have (cf. 

(14) 



\\g n \\i=0(h-f>) 
Also, if additionally (6) holds, then 



E[C 2 ] = jgl 



2 XQ-U 



f Y {u)du~ D 1 f Y (x )hl l - 2 P 



(15) 



as n — > oo, see [18, Lemma 4]. 

Let Xjj-i = J2T=i °kZj-k = Xj - Zj. Then, by (14), 



E(E[C,|^-i]) 2 



E E 



x - Y x 



= E[h n g n (xo - (u + X lfl ))f e+Z {u)du 



< 0{h z n )E / \g n (x -(u + X lfi ))\du 



0(hl)E[ I \g n (v)\dv\ =0(h 2 ~ 2 P) 



(16) 



Thus, by (15), (16) and Cauchy-Schwartz inequality, 



E£ 2 = EC l 2 + E(E[C i |^_i]) 2 -2E[C l E[C i |^-i]] 

= EC 2 + O(hT^) + O(ht h^) = EC 2 + o(h^). (17) 

Consequently, via (15), E£ 2 ~ DifY{xo)h} l ~ 2 ^ as n — > oo (note that ^ depends 
on n) and 

Em 2 (x ) ~ D 1 ! Y {x Q )nh 1 - 2f} . (18) 

For r = 0, 1,2, let 



i#>(z) = X) (*o - -Y,., , + z) - 4 r) (^o + z)). 
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Note that F^\y) = EF^ z {y-X xfi ) (see Lemma D). Also, E[g n (^%L) = 
h n J g n (v)fe+z(xo - Xjj-i - h n v)dv. Consequently, 

l n (xo) = h„ \ g n (y)R^\-h n v)dv. 



(19) 

Proof of Proposition 2.1. We claim that under SRD assumption, Vari„(a;o) = 
0{nh 2 n - 213 ). Indeed, by [28, Lemma 3], sup 2 = 0(n). From (19) and 

by Cauchy- Schwartz inequality, 

ml(xo) = h 2 n E(J g n (v)Rlp(-h n v)dv^j 

g n (u)g n (v)R^ (-h n u)R^ (-h n v)dudv 



< hiE 



< 



(u)\\g n (v)\E\\R£\-h n u)\\R< n 1 \-h n v)\ 



dudv 



< 0(n) ( / | ffn (u)|du) =O(nh 2 n - 20 ) 



(20) 



Consequently, comparing (20) with (18) we see that l n (xo) is negligible com- 
pared to the martingale part m n (xo). Also, via Cauchy-Schwartz inequality, the 
mixed term l n {xo)i r n, n (xo) is negligible. The result of Proposition 2.1 follows by 
considering (13) and (18). □ 

Proof of Proposition 2.2. Recall (19). Take Taylor expansion, 

R%H-h n v) = i?«(0) - h n vRl 2 \0, 
where £ = £(v) = £(v, h n ). Thus, 

Ell{x Q ) = h 2 n ERlP(0) ( ( g n (v)dv] + 0(h 4 n )E ( [ vg n (v)R^(^))du 



du. 



From Lemmas B, C, the first term is of order (fyixo^h^a^ • On account 

of Lemmas A, (C) and by Cauchy-Schwartz inequality, the third term is bounded 
by 



0{hl) g n {u)du / ug n (u)E\R^(0)R^(a^)) 



OK) / / ug n {u)vg n {v)E\rt 2 \t{u))RW((i{v)) 



= 0(h 4 n ) supE 



\u\\g n (u)\du 



dudv 

2 



0(hia 2 ) h? +1 \u\\g n (u)\du) h~ 2 ^ = o^al x ) 
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Similarly, for the second term the bound is (cf. Lemmas A and C) 



1/2 



supE 



1/2 



O{h 3 n h-0) (E (R£\0) j 

= o(^<>« (/m) °(i) = <ttir v <Z,i)- 

We conclude 

Combining (13), (18) and (21) we obtain (8). 



\ug n (u)\du 



(21) 
□ 



Proof of Proposition 2.5. We sketch it briefly, since it is similar to the previous 
one. 



Let G n (x ) = g n (u)du. Then F n (x ) = i S"=i G « 



F„(x ) - EF„(x ) = -J2 (Gn (\^-) - E 



and 



Similarly to (3), 



EK 

3=1 



EG„ 



a; - Yj 



E K 

3=1 V 

+ E(e 

3=1 V 



x - Yj 



-E 



.T - Yj 



h n 
-EG„ 



Then. 



EM 2 (x ) = O^" 2 ^). 
Further, we have as in (19), 

L n (x Q )=h n [ G n (v)R^(-h n v)dv= [ g n (v)R^(-h n v)d 



:M n (x ) +L n (x ). 

(22) 



Taking Taylor expansion Rh (— h n v) = Rn\o) — h n vRn {£) we obtain as in 
the proof of Proposition 2.2, 

vlI(x ) = (,M*o)) 2 < A7 2/3 + o (o-lxh-W) . 



(23) 



Comparing (22) with (23) we can see that the martingale part is of smaller 
order. Consequently, 

VarF^o) = (fy {x,)f\h-^ + o (^h~A . 



Since bias(F„(xo)) = ^f'(xo) J u 2 K{u)duhf l + o{hf l ) we conclude the result. □ 
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In order to prove CLT, we will use the martingale central limit theorem. 
Lemma 3.1. Assume that nh n — > oo. Then 

(n&i- 2/9 r 1/J W(*o) ± N(0,a 2 (x )) 

Proof. The proof is similar to that of Lemma 2 in [28] . 

Since m n is a martingale it suffices to verify the Lindebcrg condition and 
convergence of conditional variances. 

Let Cj = (nftn -2/3 ) -1/2 Ci, 6 = <y*n~ 2/3 )~ 1/2 &- Notc that for sufficiently large 
n we have < C and the bound does not depend on v nor n. As for the 

Lindcberg condition we have by 



TlE 



< 4nE 



^i 1 -cioi>e/2} 



4 ^ 2/3 / 9l(v)fY(xo - u/i„)l {|9n(t , )|>( „^i-2 3)1/2£/2}( iz; 

O(^) / ^( t, ) 1 {fcg|fl»(«)|>(nfc n )Va e /a} dt '- 



The set {^|ffm(i>)l > (w/in) 1 '^/^} becomes empty for sufficiently large n. Con- 
sequently, 



nE 



1 { i • :• 







as n ■— > oo. 

Now, we want to show that 



(24) 



As in (17), we have 

E [gl^-x] = E [C||^-i] - E (E[0|^-_i]) 2 = E [C^-i] + Op(n-X). 
Consequently, 

n n 

E E [^-J-E E ^-J=°Hi) 

and it suffices to prove 

n 

EE^-JJVo^o). 
3=1 

We have 

n n 
= 1 3=1 

= ^ f 9l(v)-R^(-Kv)dv 
J n 

< hf [ g 2 n (v)-\R^(-h n v) - Rg\0)\dv + -R^(0)h 2 / [ g 2 n (v)dv. 



3=1 
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By (15) and the ergodic theorem, the second part is op(l). By Lipschitz conti- 
nuity of f e +z and fy, the first part is bounded by 

0{hf) ( g 2 n {v)wm{l,\h n v\}dv 



which converges to by the dominated convergence theorem since h"^g n (v) is 
uniformly bounded in v and n and integrablc. □ 

Proof of Theorem 2.6. It follows from Lemma 3.1 and (20). □ 

Proof of Theorem 2.7. If h n = o(n/o~ n ^i), then it follows from Lemma 3.1 and 
(21), since the martingale part dominates the diffcrcntiablc part. 
If n/ofj l = o(h n ), then as in the proof of Proposition 2.2, 

a-^h^Uxo) = J#>(0) + op(1). 

Consequently, the result follows by Lemma C since the martingale part is neg- 
ligible. □ 



3.2. Supersmooth case 

Then 



Proof of Proposition 2.15. Let Z n ,j = -^~9n ( X °h ^ ) 

1 2 " _1 

Var/„(x ) = -VarZ„, + - VYl - j/n)Cov(Z n , Z nJ ) 
n n z — ' 

i=i 

1 ( 2 n ~ X 

= -VarZ„. + O -V(l-j/ 

o ra— 1 1 / r / \ \ 2 

2 £(i-v.*4 (/- 



2 



I 9n \~h 

From [20] we know that 

Z« := C-hl {[m+1]l3+l3o -^cxp(2a(d/h n r) < h 
n 

and 

h < chl^+iWo-D cM 2a(d/h n f) (Hl/h n )) 2m =: U™. 
n 

Now, from continuity and integrability of f Y we obtain via Lemma 3.1 in [11], 
' X ° ^ f Y (u)) 2 > Ch^ m +W+M exp{ 2a(d/h n f) 



hi [I 9n V h 
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and 

h (/ 9n (ritr) fv {u) ) * - c^ ([ro+1I/3+/3 °- 1) cm^kY) My**))** . 

(25) 

Consequently, 

2 

:= C — ^-h n exp(2a(d//i„) p ) < J 3 

and 

j 3 < c ^i^2([ m+ i]^+^-i) eM 2a(d/h n f) (\n(l/h n )) 2m =: . 
Further, as in (25), 

h < o (^j ^([™+i]^o-i) cxp( 2a(d// ln )' 3 ) (ln(l/ft„)) 2m =: C/( 2 ). 

To assure that 1\ — > as n — > oo we choose /i„ = d( ( - 1 _g° lnn ) 1 ^ ;3 , 6* € (0,1). 

Now, U n 1} = o(?l 3) ) and U n 2) = o(l n 3) ) as long as h n ~ C(lnn)- K , k > 0. 
Consequently, with our choice of /i n the third part I3 dominates both 1\ and I2. 
The upper bound for the third part is 

°(^) =0(n_K) ' 

k > as long as < 2 — 27 < < 1. Consequently, via (13), the bias term 
dominates and the mean square rate of convergence is of the order (Inn)" 2 /' 3 . 

□ 



Appendix A 



Lemma A. Assume (2), (4), (7). Then ug n (u) G Li(R), j \ug n (u)\du 
0(h~ 2 ) and consequently h^ +1 J \ug n (u)\du = o(l) for (3 > 1. 



Proof. Integrate by parts three times to obtain 



\u 3 g n (u)\ < —J 



4>e{t/K 



(3) 



dt. 



Consequently, if we show that the right-hand side is bounded by Ch~ 2 we will 
prove that \ug n {u)\ = 0(|w| -2 ) (the bound depends on h n ) and hence |ug„(w)| 
is integrable. 
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We have 



m*) V 3) _ <t>K ] (t) 3 <${tw e {tih n ) , 5 <t>' K {tm{t/K)f 



4>e{t/h n )J <j> e (t/hn) K &{t/h n ) h? n miK) 



1 4 2 \t/h n ) , 4 <j> K (t)<t>?>(t/h n )ct>' € (t/hr, 

[2(f> K (t) + 4> K [t)\ 



1 



4>i(t/h n ) 



Taking integral of each term separately on {|f | < Mh n } and {\t\ > Mh n }, using 
(2), (4) and boundness of derivatives we obtain for the terms involving hf t : 

<PK(t)4>?\t/h n )4>' e {tlK 



J{\t\<Mh n } 
1 



(It 



1 




l{\t\<Mh„} 





<pi(t/K) 



dt = 0{h~ 2 ) 



On {\t\ > Mh n } we utilize condition (7) and the form (2) of 4> e and the corre- 
sponding behavior of its derivatives. □ 

To establish exact asymptotics in the LRD case, we need the precise result 
on behavior of € g„(u)du. 

Lemma B. Assume (2), (4), (5). Then 

lim h® / g n (u)du = 1. 



Proof. In view of [18, Lemma 3], g n £ Li(R). Let g(u) = J exp (afrit) dt. 

By the inversion formula, </>if (i)/</> £ (i) = J exp(—itu)g(u)du. Since g G Li(lR), 
taking t = 0, we obtain J g(u)du = 1. 
On the other hand, 



lim / h^g n (u)du = / lim h^g n (u)du 

J n — >oo 



(i) 



dtdu. 



The change of limit and integrals is permitted since ([18, Lemma 3]) 
hi\g n {u)\ = h p n \g n {u)\\ { \ u \ <1} + h^\g n (u)\\ { \ u \> 1} 



and 



hi 



Mt) 



(f>e(t/h n ) 



< C^nl{|«|<l} 



<M*) 



4> e (t/h n ) 



hi 



4>e{t/h n ) 



dt + ChJu 2 l{|„|>i } 



<f>e{t/h n ) 



{\t\>Mhn} 



1 {\t\<Mh n } + h% 
< Ch*l m <Mh n \ 

+ Ch^ K (t)\\t-^. 

The upper bounds are integrable as the functions of u and t, respectively. □ 
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Appendix B 



Lemma C. Assume (Bl), (B2) and ~EZf < oo. Then under the LRD assump- 
tions and r = 0, 1, 2 we have a weak convergence 

a; i \R\:\z)^^ +1 \x + z)Z, 



supE 



= 0«i), 



m dE[(Jd r, (o))v< 1 (/r ) (*o)r. 

Proof. Let r = 0, 1, 2. Let G„ be the empirical distribution function associated 
with Xi ; o, ■ • • 7 X„ t n-i. Let G be the distribution of Xi$. Then 

R { n r) (z) = nj (G n (u) - G(u)) f££\x -u + z)du 

Consequently, under the condition J f^^ (v)dv < oo we have 
\R^(z)\ < Gn sup |G„(u) - G(u)\ 

u 

and the bound is independent of z. Now, we apply Theorem 2 in [26] with p = 0. 
Then E [sup u \G n (u) - G{u)\ 2 ] = 0{a 2 n 1 /n 2 ) and hence sup z F,[\R^ ] (z)\ 2 ] = 
0«i)- 

Further, we can apply Theorem 1 in [26] to obtain 



E 



flM(0) + /£ +1 >( !B D)£*«-i 

3=1 



o(o-ii)- 



Consequently, 



(4 r) (o)) a l - (/r i} (-o) 



□ 



Lemma D. Assume that X) r =o / / fe+z( v ) dv < oo and E|Yi| K < oo /or 
some K > 0. TTien 

4 r) (y) = E^(y-X 1)0 ). 

Proof. It follows from [26, Lemma 6]. □ 

Lemma E. Let A; > 1. Assume that either J |/j (v)\ k dv < oo or J \f z (v)\ k dv < 
oo. Then J \f^ z (v)\ k dv < oo. 

Proof. By Fubini's theorem 



\&(v)\ k dv= / \EfP(v-e)\ k dv<V \fP(v-e)\ k dv 



< oo. 



□ 
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